Word Alignment Step by Step
نویسنده
چکیده
In this paper the current stage o f the Uppsala Word Aligner (UWA) is described. The system is developed within the project on parallel texts, PLUG, which has its focus on the analysis o f bi-lingual text collections with Swedish either as the source or the target language. UWA comprises a set o f knowledgelite approaches' for word alignment and lexicon extraction. A distinctive feature is its modularity. In the article, the main principles o f the alignment software are introduced, different configurations and approaches are described, and examples o f alignment results are presented.
منابع مشابه
A Framework of 2-step Bilingual Alignment for SMT: in Case Study of Thai-English Translation
This paper presents a framework of a new word alignment process that can be used in an SMT development. The method was designed to include the quality of using dictionary as prior knowledge and the ability of co-occurrence to fill unknown words. The alignment method is split into two separated steps: firstly, the dictionary-based step to guarantee the accurate wordaligning and secondly, co-occu...
متن کاملSemi-Supervised Training for Statistical Word Alignment
We introduce a semi-supervised approach to training for statistical machine translation that alternates the traditional Expectation Maximization step that is applied on a large training corpus with a discriminative step aimed at increasing word-alignment quality on a small, manually word-aligned sub-corpus. We show that our algorithm leads not only to improved alignments but also to machine tra...
متن کاملSemi-supervised Word Alignment with Mechanical Turk
Word alignment is an important preprocessing step for machine translation. The project aims at incorporating manual alignments from Amazon Mechanical Turk (MTurk) to help improve word alignment quality. As a global crowdsourcing service, MTurk can provide flexible and abundant labor force and therefore reduce the cost of obtaining labels. An easyto-use interface is developed to simplify the lab...
متن کاملRefining Kazakh Word Alignment Using Simulation Modeling Methods for Statistical Machine Translation
Word alignment play an important role in the training of statistical machine translation systems. We present a technique to refine word alignments at phrase level after the collection of sentences from the Kazakh-English parallel corpora. The estimation technique extracts the phrase pairs from the word alignment and then incorporates them into the translation system for further steps. Although ...
متن کاملUsing Tectogrammatical Alignment in Phrase-Based Machine Translation
In this paper, we describe an experiment whose goal is to improve the quality of machine translation. Phrase-based machine translation, which is the state-of-the-art in the field of statistical machine translation, learns its phrase tables from large parallel corpora, which have to be aligned on the word level. The most common word-alignment tool is GIZA++. It is very universal and language ind...
متن کامل